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The mass-function of dwarf satellite galaxies that are observed around Local Group galaxies 
substantially differs from simulations^ based on cold dark matter: the simulations predict 
many more dwarf galaxies than are seen. The Local Group, however, may be anomalous in 
this regard 6 7 . A massive dark satellite in an early-type lens galaxy at z = 0.222 was recently 
found 8 using a new method based on gravitational lensing 9 10 , suggesting that the mass frac- 
tion contained in substructure could be higher than is predicted from simulations. The lack 
of very low mass detections, however, prohibited any constraint on their mass function. Here 
we report the presence of a 1.9 ± 0.1 x 10 a M Q dark satellite in the Einstein-ring system 
JVAS B1938+666 (ref. 11) at z = 0.881, where M denotes solar mass. This satellite galaxy 
has a mass similar to the Sagittarius 12 galaxy, which is a satellite of the Milky Way. We deter- 
mine the logarithmic slope of the mass function for substructure beyond the local Universe to 
be o. — I.I^q 4, with an average mass-fraction of (f) — 3.3^ g %, by combining data on both 
of these recently discovered galaxies. Our results are consistent with the predictions from 
cold dark matter simulations^^ at the 95 per cent confidence level,and therefore agree with 
the view that galaxies formed hierarchically in a Universe composed of cold dark matter. 

The gravitational lens system JVAS B 193 8+666 (ref. 11) has a bright infrared background 
galaxy at redshift 2.059 (ref. 16), which is gravitationally lensed into an almost complete Einstein 
ring of diameter ~ 0.9 arcseconds by a massive elliptical galaxy at redshift 0.881 (ref. 17). The 
bright, highly-magnified Einstein ring made this system an excellent candidate in which to to 
search for surface brightness anomalies caused by very low mass (dark matter) substructure in 
the halo around the high redshift elliptical lens galaxy. The presence of a low-mass substructure 
(e.g. a luminous or dark satellite galaxy; also denoted as substructure hereafter) in the lens galaxy 
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can introduce a localized perturbation of the arc surface brightness distribution. Owing to the 
multiplicity of the gravitationally lensed images that form these arcs, these 'surface brightness 
anomalies' can be analysed using a pixelized-lens modelling technique and used to gravitationally 
detect and quantify the total mass and position of the substructure, down to masses as low as 
~0.1% of the mass of the lens inside the Einstein radius 9 ™ The lens system was imaged at 1.6 
and 2.2 micron using the Near Infrared Camera (NIRC2) on the W. M. Keck 10-m telescope in 
June 2010. The adaptive optics system was used to correct the incoming wavefront for the blurring 
induced by the atmosphere, providing a nearly diffraction-limited point spread function (PSF) with 
a full width at half maximum (FWHM) of ~70 milli-arcseconds. Further details for the data sets 
and their image processing can be found in Supplementary Information. 

A smooth parametric model for the lens potential was constrained by using the surface 
brightness emission from the Einstein ring of each data set independently, having first removed 
the smooth light contribution from the lensing galaxy. We represented the mass model using an 
ellipsoidal power-law, p(r) oc r~ 7 , where p(r) is the combined luminous and dark matter density 
as a function of the ellipsoidal radius r. The best-fitting model was then fixed and further refined 
using local potential corrections defined on a regular grid, which are translated into surface density 
corrections with the Laplace operator. We found for both the 1.6 and 2.2 micron adaptive optics 
data sets that there was a significant positive density correction, which indicated the presence of a 
mass substructure (see Fig. 1 and Supplementary Information). Directly from the pixelized poten- 
tial correction, we measured a substructure mass of ~ 1.7 x 10 8 M & inside a projected radius of 
600 pc around the density peak. 

As an independent test, we repeated the analysis of the 2.2 micron data set, which had the 
highest-significance positive density correction, with different models of the PSF, different data 
reduction techniques, different rotations of the lensed images, different models for the lens galaxy 
surface brightness subtraction, and different resolutions for the reconstructed source. We also anal- 
ysed an independent data set taken at 1.6 micron with the Near Infrared Camera and Multi-Object 
Spectrograph (NICMOS) onboard NASA's Hubble Space Telescope. In total, we tested fourteen 
different models, and three different data sets that all independently led to the detection of a pos- 
itive density correction at the same spatial position, although with varying levels of significance 
(see Supplementary Information). Differential extinction across the gravitational arc could also 
produce a surface brightness anomaly. However, the colour of the arc was found to be consistent 
around and at the location of substructure, ruling out the possibly that dust affected our results. 
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Figure 1: The detection of a dark-matter dominated satellite in the gravitational lens system 
B1938+666 at redshift 0.881. The data shown here are at 2.2 micron and were taken with the 
W. M. Keck telescope in June 2010. Additional data sets at 1.6 micron, from the Keck tele- 
scope and the Hubble Space Telescope, are presented in the Supplementary Information. Top-left 
panel: the original data set with the lensing galaxy subtracted. Top-middle panel: the final re- 
construction. Top-right panel: the image residuals. Bottom-left panel: the source reconstruction. 
Bottom-middle panel: the potential correction from a smooth potential required by the model to 
fit the data. Bottom-right panel: the resulting dimensionless projected density corrections. The 
total lensing potential is defined as the sum of an analytic potential for the host galaxy plus the 
local pixelized potential corrections defined on a Cartesian grid. The potential corrections are a 
general correction to the analytical smooth potential and correct for the presence of substructure, 
for large-scale moments in the density profile of the galaxy and shear. When the Laplace opera- 
tor is applied to the potential corrections and translated into surface density corrections, the terms 
related to the shear and mass sheets become zero and a constant, respectively. A strong positive 
density correction is found on the top part of the lensed arc. Note that these images are set on 
a arbitrary regular grid that has the origin shifted relative to the centre of the smooth lens model 
by Ax = 0.024 arcsec and Ay = 0.089 arcsec. When this shift is taken into account the position 
of the density correction is consistent with the position of the substructure found in the analytic 
re-construction (see Supplementary Information). 
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We used an analytic model to determine the mass and the statistical significance of the sub- 
structure in the context of a physical modeP^. In this analytic approach, a truncated pseudo-Jaffe 
model was used to parameterize the substructure mass and position, giving three extra free param- 
eters. To obtain a good fit to the observed surface brightness distribution of the lensed background 
source at 2.2 micron, we found that a substructure was required at a position consistent with the 
positive density correction detected above. Assuming the substructure to be situated in the plane 
of the lens galaxy, we objectively compared the smooth and the substructure parametric models 
in terms of the Bayesian evidence, which is a measure of the probability of the data given a spe- 
cific model (mariginalized over all parameters). Computing the marginalized evidence involved 
integrating over the multidimensional parameter space within predefined priors. In particular, we 
assumed that the substructure was equally likely to be located at any point in the lens plane and to 
have a mass between 4.0 x 10 6 M and 4.0 x 10 9 M , the mass range where comparison with sim- 
ulations was possibleP^. See Supplementary Information for further details on the Bayesian evi- 
dence. We found that our nominal model, with a substructure of mass M sub = 1.9 ± 0.1 x 10 8 M 
located at (0.036 ± 0.005, 0.576 ± 0.007) arcseconds relative to the lensing galaxy (in sky coor- 
dinates defined positive towards the north and the west, respectively), was preferred by a factor 
e 65 over a smooth model. This would heuristically correspond to a 12-ct detection of the substruc- 
ture, if the posterior probability distribution function were Gaussian. This agrees well with the 
substructure mass found by the pixelized potential correction method described above, given the 
systematic and statistical uncertainties (see Supplementary information). 

We also considered a model containing two substructures. The corresponding Bayesian ev- 
idence was significantly lower than the Bayesian evidence of the smooth model and the single 
substructure model. Above, we have quoted the statistical uncertainties for the substructure mass 
and position, but the uncertainties will also be related to several sources of systematic error, which 
are discussed in Supplementary Information. In general, the uncertainty in the substructure mass 
(under the assumption that it is located in the lens plane) is entirely dominated by the inference in 
the tidal truncation radius, which requires the substructure to be de-projected, yielding a systematic 
uncertainty of 0.45 dex. 

On the basis of the detection of the substructure, we calculated the joint posterior probability 
function for the projected mass fraction of the halo that is made up of substructure^, /, and the 
slope, a, of the substructure mass function, (where dN oc dMM~ a , where N is the number 
density of subhaloes per comoving volume and M is the substructure mass; see Supplementary 
Information). For the JVAS B1938+666 lensing galaxy (M lens = 2.46 x 10 10 M within R E = 
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3.39 kpc), we found that / = 3.91J 4 % at the 68 per cent confidence level, using a uniform prior 
probability distribution for a, for a substructure mass range between 4.0 x 1O 6 M and 4.0 x 10 9 M Q . 
For a Gaussian prior probability distribution for a centred at 1.9 ± 0.1, which is the slope of the 
mass function predicted from simulations^S, / = I.S^qJ % at the 68 per cent confidence level. 
The predicted fraction of substructure from simulations within the same mass range and projected 
distance from the host halo centre is ~ 0.1 jl^f % (ref. 19), which is marginally smaller than 
the lower limit implied by our detection of a substructure at the 68 per cent confidence level, 
independent of the prior set on the slope of the mass function. However, these simulations model 
the formation of a Milky Way halo at z = and simulations of elliptical galaxies out to z ~ 1 
must be made before a more quantitative conclusion can be drawn. 

Whereas flux-ratio anomalies of multiply imaged quasars have previously been used to sta- 
tistically measure the level of substructure in cosmologically distant lens galaxies^^, these anal- 
yses have degeneracies when the substructures are dark and it is difficult to localize and measure 
the masses of individual substructures. Hence, the shape of the substructure mass function can- 
not be constrained and so far it has remained unclear whether the results from quasar flux ratio 
anomalies are in agreement or disagreement with numerical simulation^. Our new low-mass de- 
tection also allows us to constrain for the first time the slope of the substructure mass function for 
galaxies other than our own, when combined with the detection of the ~ 18-fold more massive (i.e. 
3.5 x 1O 9 M ) substructure in the elliptical lens galaxy SDSS J0946+1006 (M lcns = 2.45 x 10 10 M 
within Re = 4.60 kpc) at redshift 0.222 that was previously reported 8 . Combining both detec- 
tions resulted in a slope of a = l.lt i for the mass function and an average mass-fraction of 
(/) = 3.3i^g % for elliptical galaxies at the 68 per cent confidence level. These results suggest 
that the slope of the mass function for elliptical galaxies is similar to that observed for Milky Way 
satellites, but that elliptical galaxies have higher mass fractions. 

So far, detailed studies of the mass and luminosity properties of substructures have been 
limited to the Milky Way^HSHSO an( j to some extent also to M31 (ref. 27) at a distance of ~1 Mpc 
from the Milky Way. About thirty luminous satellite galaxies detected within the Milky Way 
virial radius (~ 250 kpc) are considered possible cases of cold dark matter substructure, albeit at 
a much lower abundance than is predicted from simulations 15 . Twenty-three of the Milky Way 
satellites have been found to have masses of ~ 10 7 M within a 300-pc projected radius, from 
observations of the dynamics of their starPH This method for determining masses is limited to 
the Local Group owing to the faintness of the satellite galaxies, and can have a systematic error 
if the satellite is not in dynamic equilibrium or if there is foreground contamination. The three- 



5 



dimensional mass of the JVAS B1938+666 substructure within a radius of 300 pc is M 300 = 
7.2 ± 0.5 x 10 7 M Q (M 30 o = 3.4 x 10 7 M & for a singular isothermal sphere model), which is 
comparable to the masses of the Milky Way satellites 28 , but is hosted by an elliptical galaxy with 
a velocity dispersion of cr SIS = 187 km s _1 at a co-moving distance of ~3 Gpc, corresponding 
approximately to a time when the Universe was half the age it is today. A 3-a upper-limit of 
L v < 5.4 x 10 7 L V:Q was found for the luminosity of the substructure in the rest-frame V-band 
within the tidal radius r t = 440 ± 5 pc. This is about a factor four brighter than Sagittarius and 
FornaxPl The velocity dispersion of the satellite, based on the Einstein radius found from the best 
fitting model, is o v fa 16 km s _1 , which corresponds to a circular velocity of V^ rc ~ 27 km s _1 . Its 
three-dimensional mass within 600 pc is M 60 o = 1.13 ± 0.06 x 10 8 M . The observed properties 
(mass and circular velocity) are comparable to the Sagittarius dwarf galaxy of the Milky Way^. 

The mass-to-light ratios of low-mass satellites around the Milky Way have also been found 
to disagree with the expectations from simulations. It has been predicted that satellites with the 
luminosity of Fornax and Sagittarius should have a velocity dispersion that is a factor of two higher 
than has been observed (~ 16 km s -1 ), which represents another problem for the cold dark matter 
paradigm 6 . In the case of the JVAS B 1938+666 substructure, the 3cr upper limit to the luminosity is 
in the range of the Fornax and Sagittarius satellites, and the resulting mass-to light ratio has a lower 
limit of > 3.5 M Q /L VjQ . Although this result is consistent with the mass-to-light ratios predicted 
from simulations, the limit on the luminosity of the substructure will need to be lowered by an 
order of magnitude before any meaningful comparison can be made. This will only be possible 
with the next generation of large optical telescopes, which is planned to come into operation by 
the end of the decade. 
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Supplementary Information 



Observations and data reduction 

The data used for this work were taken as part of the Strong-lensing at High Angular Resolution 
Program (SHARP) and will be discussed in detail by Lagattuta et al. (in preparation); here we 
present only a concise summary. We observed the B1938+666 lens system (ref. 11) on 2010 
June 29 and 30, using the Near Infrared Camera 2 (NIRC2) with the adaptive optics (AO) system 
mounted on the Keck II telescope. To achieve the best possible image resolution, we used the 
NIRC2 "narrow" camera, with a 100 square arcsecond field-of-view and a 0.01 arcseconds pixel 
scale. The tip-tilt AO correction was achieved by simultaneously observing an R = 15 magnitude 
star that was located 18 arcseconds away from the target, while the secondary AO correction was 
achieved with a 10-Watt, A = 589 nm sodium laser guide star. We collected 82 x 180 second 
exposures of B 1938+666 in the K' band (2.2 micron), and 22 x 300 second exposures in the H 
band (1.6 micron). 

The two data sets were reduced independently. We first flat-fielded the data using an em- 
pirical sky flat that was created by median stacking all of the raw science frames. In order not to 
bias the median sky flat, we made sure that all of the pixels that contained astronomical flux were 
masked out during this process. After flat- fielding, we subtracted a median sky background from 
each image and then spatially resampled the images to remove the geometric distortion. The image 
registration was achieved using a pixel-based cross-correlation technique. We used the sub-pixel 
shifts measured from the cross -correlation to drizzle each frame onto a common coordinate system. 
Finally, we median stacked the reduced individual frames for a second time, in order to remove any 
bad-pixel artefacts and cosmic ray relics. This median stacked image was our final output frame. 

We also made a separate reduction using the Center for Adaptive optics Treasury Survey 
(CATS) pipeline to test any systematics in the data reduction process. 

For our analysis, we also used imaging data of B1938+666 that were taken with the Hubble 
Space Telescope (HST) at 1.6 microns (F160W) with the Near Infrared Camera and Multi-Object 
Spectrograph (NICMOS). These data were previously published by King et al. (ref. 11) and we 
refer to that paper for further details. The data were gathered from the HST archive and reduced 
using the multidrizzle package 29 to produce the final images. 
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Lens galaxy surface brightness subtraction 

In order to remove the smooth light contribution of the lensing galaxy from the imaging data of 
B 193 8+666, we subtracted a model of the surface brightness profile that was fitted to the data 
using GALFIT 30 . The model for the lensing galaxy was made using Sersic profiles. To prevent 
over- subtraction of the Einstein ring itself, we masked all of the pixels that contained any high 
surface-brightness flux from the background galaxy when we optimised the model. 

However, the seeing-limited component of the PSF is broad compared to the separation be- 
tween the lensing galaxy and the source images and this causes some over-subtraction due to the 
GALFIT model fitting away part of the low surface brightness flux of the source images. We 
therefore implement a second subtraction scheme 31 to try to explicitly model all of the source light 
along with the lens light. We fit two Sersic components to both the lens and the source and use a 
singular isothermal ellipsoid model for the lensing kernel. This model is optimised and the best-fit 
lensing galaxy surface brightness model is then subtracted from the data, leaving just the source 
flux. 

Dust in the lens galaxy 

Differential dust extinction within the lens galaxy can change the surface brightness of the lensed 
images and produce a surface brightness anomaly. In the case of B 1938+666 the lens galaxy is 
a massive elliptical, which tend not to have large amounts of dust or show significant reddening. 
To estimate what effect, if any, dust has on the surface brightness of the lensed images, the colour 
of the 1.6 and the 2.2 micron data sets were measured at the location of the substructure, and at 
those parts of the arc away from the substructure position. We found the colour difference between 
these locations to be consistent within 0.01 -mag, ruling out the effect of dust at a level exceeding 
the noise. We also note that if the anomaly were due to dust extinction, we would expect it to be 
considerably stronger in the H-band than in the K-band (note that the H-band is 0.8 micron in the 
rest-frame where dust can still have an effect) and hence the inferred masses to be very different. 
Instead they agree very well between the two bands. 
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Bayesian Lens modelling 



We modelled the lens system using a recently developed grid based modelling technique 9 ' 10 . This 
technique is fully embedded within the Bayesian statistical formalism that allows the most prob- 
able a-posteriori lens potential and background source to be determined. The method allows the 
level of structure in the source to be objectively inferred and for different model families to be 
compared/ranked. The method is grid based in the sense that both the lensed images and the back- 
ground source surface brightness distribution are reconstructed on pixelized grids. While the image 
grid is a regular Cartesian grid defined by the data resolution, the source is reconstructed using a 
Delaunay tessellation; the source grid automatically adapts with the lensing magnification. The 
source grid is built from the lens plane grid using the lens equation and a variable number of lens 
plane pixels that are set on a case by case basis. The optimal number of lens plane pixels can be 
inferred through the Bayesian evidence. 



Initially, we considered a smooth lens galaxy parametrized as a softened power-law elliptical 
model with an external shear of strength, Y and angle, IV The projected surface mass density 
depends on the following parameters: the lens strength b, the position angle (defined north to east) 
6, the projected flattening q, the centroid of the lens mass distribution xo, yo, the core radius r c , and 
the density slope 7, and is defined (assuming for clarity here a position angle of 90 degrees) as, 



2=1 . . 2 . 1=2 



«(r) = S — (^-^o) 2 + ^^ + r c 2 . (1) 



The smooth lens model and corresponding background source were determined by optimising all 
of the above parameters with the exception of the core radius, which was set to the small value of 
r c = 0.0001 (we note that the smooth model was only used as a starting point for the modelling 
and the choice of setting r c to a small value is not relevant) in order to maximize the following 
Bayesian posterior probability, 

logP(s|77,A fl ) = - Qx 2 + A 2 ||H s s||^ + const., (2) 

In Eq. 0, s is the source surface brightness distribution and X s its regularization, 77 is a vector 
containing all of the model parameters introduced above. The source regularisation A s sets the 
level of smoothness of the source and its value was objectively set by maximizing Eq. ©. Several 
forms of regularisation, H s , can be adopted, and in this paper we chose to use a gradient regular- 
isation. Note that because of the presence of noise and because of the extended emission of the 
background source, lens modelling in terms of the Bayesian penalty function instead of a simple 
X 2 is essential 10,32 . 
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In order to quantify systematic effects on the lens modelling, we considered three different 
data sets; the 1.6 and 2.2 micron data sets from the Keck telescope and the 1.6 micron data set 
from the HST. The most probable a-posteriori models for these data sets are listed in Tables 1 
and 2, respectively, as M H , M K and M H st- Moreover, we considered twelve different models 
for the 2.2 micron data set, of which M iH _ 4 used a different point spread function (PSF) model, 
M 5 used a higher number of lens plane pixels for the source grid, M 6 had a source regularisation 
that is adaptive with the signal-to-noise ratio of the images, M 7 used the CATS data-reduction 
pipeline, M 8 used a data-set rotated by forty-five degrees, M SIS assumed a singular isothermal 
density profile for the substructure, and M 9 , M 10 and M n used three different procedures for the 
lens galaxy surface brightness subtraction. We also considered four different models for the 1.6 
micron data set from Keck using four different lens surface brightness subtraction models, Mi 2 , 
M 13 and M 14 . All of these different data sets and models led to a consistent lens potential and 
source surface brightness distribution, and will be presented by Lagattuta et al. (in preparation). 
Once the best smooth lens model was found, mass substructure in the lens galaxy was searched 
for by allowing for local potential corrections. The lens potential was then defined as the sum of a 
power-law elliptical model and pixelized corrections defined on a regular Cartesian grid, 

ip (x, rj) = Smooth (x, rj) + 5ip (x) . (3) 

The Bayesian posterior, 

\ogP(s\ri,\ s ,\ 5 ^) = -^x 2 - |H s s| |1 - A^||Hi^lll + const., (4) 

was optimised in an iterative fashion for the source surface brightness distribution and the po- 
tential corrections, while the smooth potential component was kept fixed at the best fitted value 
determined from the smooth lens model analysis described above. 

Substructures appeared as positive density corrections. This allowed the location and the 
number of possible substructures in a specific lens potential to be quantified. Previous simulations 
have shown that we can detect more than one substructure in the same system when located at a 
position where they can affect the lensed images 10 . All three independent data sets and considered 
models showed a positive potential correction located at the same position, see for example Fig. 
1 of the Letter and Figs. 2, 3 and 4 of this Supplementary Information. Each data set had a 
different resolution, signal-to-noise, and was observed at a different wavelength, this excluded the 
possibility that our detection was an artefact related to the dynamic range of the images, dust, 
cosmic rays, bad pixels or other systematic effects. The reason that the substructure position is 
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correlated with the brighter regions of the Einstein ring, is related to the fact that brighter regions 
are more sensitive to surface brightness perturbations and are therefore the regions where we are 
more likely to detect substructure with this method, if they are indeed present. It is possible that 
this lens system contains other equally low-mass or lower-mass substructure that we are not able 
to detect because of their less favourable location. 

In some cases, we also find a second smaller density peak in the lower left part of the images. 
We investigated this second peak and find it not to be significant, for the following reasons. First, 
this peak lies near the edge of the reconstruction region where the potential corrections are gener- 
ally affected and non-zero because of regularisation effects 33 . Second, the marginalised Bayesian 
evidence (see following section), disfavours a model with either a single substructure located at the 
position of the second density peak, or a model with two substructures, one for each peak, over a 
model with only a single substructure at the position of the main peak. Finally, an analytical model 
(see below) with a substructure forced at the position of the second peak gives a best recovered 
substructure mass consistent with zero. 

As a final comment we point out that potential correction maps are a general correction to 
the analytical smooth potential and correct therefore not only for the presence of substructure, but 
also for large-scale moments in the density profile of the galaxy and shear. The underlying smooth 
analytical models are different in each data set and model, and therefore require different levels 
of correction in terms of shear and mass sheet. This can cause the potential correction maps to 
look different for each data set or model. However, when the Laplace operator is applied to the 
potential correction map to get the convergence correction map the terms related to the shear and 
the mass sheets become zero or an irrelevant constant (the latter is in general forced to zero due to 
the curvature regularization), respectively. 

Once the number of significantly detected substructure was determined to be one, we quan- 
tified its mass using a model where both the host lens galaxy and the mass substructure had 
a parametrized density profile. This was done to stabilise the mass determination and place 
it in the context of a parametric model that was used previously. As before, the lens galaxy 
was parametrized with a power-law elliptical profile [see Eq. dQ], while the substructure was 
parametrized using a pseudo-Jaffe truncated profile defined by, 

«(r) = ^[r- 1 -(r 5, + rf)- 1 ^] ) (5) 

where r t is the substructure tidal radius and 6 sub is the lens strength; both are related to the main 
galaxy lens strength b and to M sub by r t = y/b suh b and M sub = 7rr t 6 sub E c . This assumed that 
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the projected and the true position of the substructure relative to the host lens galaxy were the 
same. While this can be on average considered correct, the uncertainty on the true position of the 
substructure is a considerable source of systematic error on the substructure mass, as we discuss 
in more detail below. Combining the last two relations leaves the total mass and position of the 
substructure on the lens plane as free parameters for the substructure model. By maximizing the 
Bayesian posterior probability, Eq. ©, in terms of 77, A s , the substructure mass and substructure 
position, we found the best substructure models listed in Tables 1, 2 and 3. All three data sets and 
different models gave consistent values for the lens galaxy and substructure parameters. More- 
over, in all cases the recovered substructure position was consistent with the grid-based density 
correction previously identified. We also considered a model where the substructure had a singular 
isothermal profile (Msis, which is equivalent to a pseudo-Jaffe with r t — > inf). We found it to be 
located at (0.04, 0.59) arcseconds relative to the lens galaxy, consistent within the uncertainties of 
all other models, and with a lens strength b = 0.003 arcsec, which corresponded to a SIS velocity 
dispersion of a v = 15.6 km s _1 . 

The Substructure Mass from the Potential Grid Correction 

The mass determination of the substructure from the analytic (pseudo-Jaffe) model is robust and 
within the context of a physical model. A nearly model-independent check of this mass deter- 
mination, is to measure it directly from the grid-based potential correction. To ensure a correct 
comparison of the derived masses - since some of the substructure surface density could be due to 
a correction to the global smooth model - we use the grid-based potential solution starting from 
the same smooth mass model as in the fully analytic model. A straightforward estimator comes 
from summing over pixels in the convergence map, which one determines through the Poisson 
equation, given in dimensionless units by V 2 Sip = 25k. For 3x3 pixels centred on the conver- 
gence peak, which is similar to a ~600 pc aperture, we find a mass of 1.6 x 1O 8 M . This is in 
excellent agreement with the analytic model, when correcting for the fact that this measurements 
is a projected mass and given the uncertainties. Because the convergence (i.e. surface density) map 
is directly based on the potential correction through finite differencing, its errors are correlated and 
harder to estimate. To reduce the effect of finite differencing, we apply Gauss' theorem directly to 
the potential correction grid, by integrating the in-product of V8ip with the normal vector (pointed 
inward) of a circular curve centred on the convergence peak. Tests show that this method suffers 
little from discretisation of the grid. This way, we find a mass of 1.7 x 1O 8 M inside an exact 
600 pc projected radius, again in good agreement with the direct method and the analytic model. 
Second order differences could still be due to the choice of the substructure density profile, but the 
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agreement shows these effects to be small. We are therefore confident that both the grid-based and 
analytic models give robust results, in agreement with each other within the errors. Throughout 
our analyses, for consistency, we will use only the mass determinations from the analytic model. 

Model comparison 

Models with a different number of degrees of freedom can be consistently and objectively com- 
pared in the context of Bayesian statistics in terms of the marginalized Bayesian evidence. For a 
given model M and a regularisation form R, the marginalized Bayesian evidence is defined as, 



This marginalized Bayesian evidence provides a measure of the probability of a specific model 
given the data d. If the prior probability P(M, R) is flat then different models can be compared 
according to their value of P(d | M, R), which is related to the Bayesian evidence by, 



P(\ s ,r)) is the prior probability distribution on all of the model parameters. In this Letter, we 
set all of the lens parameters in rj to have uniform priors centred around the best fitted values of 
the smooth M K data set and as wide as 1 up to 2 orders of magnitude larger than the dispersion 
estimated from all of the different Mi data sets and models, depending on the lens parameter. We 
note that the priors for all models are identical. The source regularisation constant had a uniform 
prior in the logarithmic space. The prior on the substructure position was uniform within the 
lens plane, that is, the substructure had an equal probability of being at any position in the lens 
plane and was not forced at any time to be at the same position as the previously detected density 
correction. This can be considered as an extra test of the substructure detection. The prior on 
the substructure mass was also uniform between 4 x 10 6 M and 4 x 10 9 M Q . To allow for a 
proper comparison, the priors had to be the same for both the smooth and substructured models. 
A by-product of this integration is an exploration of the posterior probability, allowing for an 
error analysis of the model parameters and of the evidence itself. The results of this analysis for 
the 2.2 micron data set, which had the largest significance of all data sets, are given in Table 2, 
model M. In Fig. 5, we show the posterior probability distribution for the substructure mass and 
position. We found that a model with a substructure at (0.036 ± 0.005, 0.576 ± 0.007) arcseconds 
relative to the lensing galaxy and with a mass of (1.9 ± 0.1) x 10 8 M Q was statistically preferred 
over a smooth model with a difference in the marginalized Bayesian evidence (Bayes factor) of 
A log£ = \og£ smooth — \og£ sub = —65.10, which corresponds to a 12 a detection, if the posterior 
probability distribution function were Gaussian. 



£ = P(M, R | d) oc P(d | M, R)P(M, R) . 



(6) 




(7) 
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Systematic error on the substructure mass 

Two major classes of systematic errors affect our measurement of the substructure mass. The first 
one is related to the modelling of the data and the data themselves, namely to the PSF modelling, 
the lens galaxy surface brightness subtraction, the data noise and resolution. The second is related 
to the de-projection of the substructure position. As discussed above, in order to quantify the 
effect related to the first class of systematic error we have analysed three independent data sets and 
fourteen different models and found an uncertainty of 0.2 x 1O 8 M . 

Two major assumptions about the physical position of the substructure were made when 
measuring its mass. First, we assumed that the substructure is physically associated with the lens 
galaxy and therefore lies at the same redshift. However, current high resolution simulations mostly 
concentrate on the formation of Milky-Way type of haloes and a precise quantification of the 
line-of-sight contamination on surface brightness anomalies for massive elliptical galaxies is still 
lacking. Second, we assumed that the projected separation between the lens and the substructure 
is equivalent to the true separation. In general the true separation will be larger than the projected 
separation and the above inferred value represents a lower limit on the total mass, but is still a good 
representation of the mass within a fixed radius of r = r t . 

The posterior probability for M su b is computed using the relation M su b = 7rr t 6 su bS c . If we 

1 /3 

use the tidal radius as the truncation radius then r t = r s (4^) (ref. 34) where r s is the true 
separation between the galaxy and the substructure, and M is the mass of the galaxy within r s . 
For B1938+666 this implies M sub /M ps nr t b suh b~ 2 and hence r t = J b ij2\J'xb sub /'ib. If we assume 
that galaxy-scale substructures follow the total mass distribution, as is found in simulations and 
observations of more massive substructures 35,36 , then r s will have a distribution that falls off ap- 
proximately as r~ 2 . However, we also have the constraint that the projected separation is r p , which 

yields a distribution of r s that goes as (r^ (r/r p ) 2 — 1 j (ref. 37). As M sub oc r s , this implies 
the same functional form for the posterior probability of M sub , neglecting the very small uncer- 
tainty on 6 sub ; this posterior inference on M sub is shown in Fig. 6. We find that the de-projection 
yields a systematic uncertainty on the total mass of 0.45 dex at the 68 per cent confidence level. 

Substructure luminosity 

We estimated an upper limit on the luminosity of the substructure by calculating the quadrature 
sum of the pixel noise in a 1 1 pixel x 1 1 pixel square aperture. The aperture width was chosen 
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to match the tidal diameter of the substructure. Two separate estimates of the pixel noise were 
used, and the results were consistent. We found that the 3-a upper limit on the observed apparent 
magnitude was K' > 28. Using a K-correction of —1.34, we found that the rest-frame V-band 
absolute magnitude 3-a limit was My > —14.5, which corresponded to a luminosity limit of 
Ly < 5.4 x lO 7 Ly )0 . This limit is a factor of four brighter than the luminosities of the Fornax 
and Sagittarius dwarf galaxies in the Local Group 12 , and therefore similar luminosity galaxies at 
z ~ 0.9 would not be observable in our deep adaptive optics imaging. 

Statistics of detections 

The measurement of a mass substructure can be statistically formalised and used to set constraints 
on the substructure projected mass fraction / and the substructure mass function slope a in lens 
galaxies 18 . As we are sensitive only to substructures that are either massive enough or close enough 
to change the surface brightness distribution of the lensed arc, / is defined as the projected dark 
matter mass fraction within an annulus of 0.1 arcseconds around the Einstein radius of the lensing 
galaxy, (3.38 ± 0.76) kpc, where our sensitivity is maximized. We defined the probability that 
the observed substructure belongs to a parent population characterised by a certain fraction and a 
certain mass function slope as, 



C (n s , m 


a,f,p)P(a,f\ 


P) 


P (n s , m 


P) 



(8) 



where p = {M min , M max , M iow , M high }. 

The likelihood probability function depends on the minimum and maximum mass a sub- 
structure can have (M min = 4.0 x 1O 6 M and M max = 4.0 x 1O 9 M ) as set by current numerical 
simulations 13 ' 14 , on the highest mass we can detect (M high = M max ), and on the lowest mass we 
can detect (Mi ow = 3.0 x 1O 7 M = 3.0 x cr Msub ). Note that higher mass satellite galaxies would 
probably be visible in the 2.2 micron data set as luminous substructure 38 , and hence, would be 
included in the lens model as a secondary galaxy. P (a, f | p) is the prior probability function for 
a and /. We assumed for the mass fraction a uniform prior between and 100 percent. For the 
prior on the mass function we considered two cases. In the first scenario, we defined a Gaussian 
prior centred on 1.9 that had a standard deviation of 0.1 (ref. 15; Fig. 7; black curves). In the 
second scenario, we chose a uniform prior between —1.0 and 3.0 (Fig. 6; blue curves). From 
the marginalized probabilities, P (/ | n s — 1, m — M su b, p) and P (a \ n s — 1, m — M su b, p), we 
derived / = 3.9l 3 ;4 % and a = l-O^; 8 at the 68 per cent confidence level (CL) for a flat prior on 
a, and / = 1.5lJ;g % and a = 1.9^o { at the 68 per cent CL for a Gaussian prior. 
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Parameter 


M K 


Mi 


M 2 


M 3 


M 4 


M 5 


M 6 


b (arcseconds) 


0.452 


0.450 


0.450 


0.450 


0.451 


0.448 


0.449 


9 (degrees) 


-22.29 


-21.32 


-21.87 


-20.66 


-20.21 


-23.30 


-22.65 


Q 


0.866 


0.868 


0.867 


0.870 


0.869 


0.870 


0.856 


7 


2.042 


2.043 


2.045 


2.046 


2.045 


2.049 


2.049 


r 


0.015 


0.021 


0.018 


0.020 


0.018 


0.016 


0.015 


T e (degrees) 


107.9 


108.4 


109.0 


108.9 


109.1 


109.3 


104.8 


M sub (10 8 M ) 


2.1 


2.6 


1.9 


2.1 


2.1 


2.1 


1.8 


x sub (arcseconds) 


0.034 


0.039 


0.031 


0.033 


0.033 


0.035 


0.043 


y su b (arcseconds) 


0.571 


0.584 


0.544 


0.574 


0.570 


0.578 


0.588 


logA s 


-3.000 


-2.990 


-3.250 


-3.203 


-3.731 


-3.853 


0.009 



Table 1 : Best-fit parameters for our lens model, b is the model lens strength, and parametrizes the 
Einstein ring radius (in arcseconds). 9 is the position angle of the mass of the lensing galaxy, q is its 
axis ratio, 7 is the slope of the mass profile, as parametrized in Eq. £0, T and Tg are, respectively, 
the magnitude and position angle of an external shear source, \ s measures the smoothness of the 
flux distribution of the lensed background galaxy, M sub is the mass of the substructure within the 
tidal radius and x sub , and y sub are its coordinates relative to the lensing galaxy. 

The independent detection of substructure in different lens galaxies can be combined by 
taking the product of their likelihood such that, 

C ({n s , m} I a, f, p) = JJ £ (n Sjfc , m k | a, f, p) , (9) 

k=l 

which is then used with Eq. (8) to determine the joint probability of a and /. 
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Parameter 


M 7 


M 8 


M 9 


Mio 


Mu 


M S is 


M 


b (arcseconds) 


0.452 


0.417 


0.447 


0.443 


0.446 


0.448 


0.443±0.007 


9 (degrees) 


-21.81 


-74.05 


-22.76 


-22.00 


-23.31 


-21.41 


-24.16±0.841 


Q 


0.868 


0.853 


0.859 


0.857 


0.857 


0.853 


0.872±0.003 


7 


2.044 


2.107 


2.033 


2.035 


2.037 


2.048 


2.058±0.013 


r 


0.015 


0.008 


0.017 


0.014 


0.015 


0.018 


0.012±0.001 


T e (degrees) 


106.8 


73.27 


97.91 


101.5 


99.92 


103.40 


113.2±2.521 


M sub (10 8 M ) 


2.2 


1.9 


2.1 


2.1 


2.0 


0.026 a 


1.9±0.1 


x sub (arcseconds) 


0.036 


0.340 


0.030 


0.047 


0.053 


0.037 


0.036±0.005 


y su b (arcseconds) 


0.578 


0.320 


0.553 


0.517 


0.507 


0.590 


0.576±0.007 


log X s 


-3.136 


-3.804 


-5.437 


-6.425 


-5.421 


-3.171 


-3.012±0.108 



Table 2: Same as Table 1 for the M 7)8i9) io,n and M S is models. The mean values of the posterior 
probability distribution with relative standard deviation for each model parameter are given in the 
last column. a Note that the mass is defined within the Einstein radius of the substructure (0.003 
arcseconds) and not the tidal radius. 



Parameter 


M H 


Mi 2 


Mi 3 


M14 


M H st 


b (arcseconds) 


0.423 


0.401 


0.407 


0.410 


0.438 


9 (degrees) 


-24.19 


-24.76 


-26.40 


-27.22 


-23.06 


Q 


0.856 


0.858 


0.860 


0.859 


0.855 


7 


2.089 


2.095 


2.090 


2.084 


2.066 


r 


0.014 


0.015 


0.016 


0.015 


0.016 


T e (degrees) 


111.4 


110.5 


110.2 


114.1 


105.9 


M sub (10 8 M ) 


2.0 


2.1 


2.0 


2.0 


1.9 


x su b (arcseconds) 


0.030 


0.033 


0.059 


0.052 


0.038 


y sub (arcseconds) 


0.555 


0.507 


0.529 


0.518 


0.522 


log X s 


-0.468 


-3.683 


-2.627 


-3.003 


1.642 



Table 3: Same as Table 1 for the M 12i i3 5 i4 models and the M h ,hst data sets. 
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Figure 2: The detection of a dark matter substructure as a positive pixelized density correction in 
the 1.6 micron Keck data set, M H . The top-right panel shows the original lens data, the middle 
one shows the final reconstruction while the top-left one shows the image residuals. On the second 
row the source reconstruction (left), the potential correction (middle) and the scale-free projected 
density corrections (right) are shown. The total lensing potential is defined as the sum of an analytic 
potential for the host galaxy plus local pixelized potential corrections defined on a Cartesian grid. 
Using the Laplace operator, the potential corrections can be translated into density corrections 
(V 2 5^ = 25k). A strong positive correction is found on the top part of the lensed arc. Note that 
these images are set on a arbitrary regular grid that has the origin shifted relative to the centre of 
the smooth lens model by A x = 0.012 and A y = 0.091 arcseconds. When this shift is taken into 
account the position of the density correction is consistent with the position of the substructure in 
the analytic reconstruction. 
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Figure 3: The detection of a dark matter substructure as a positive pixelized density correction 
in the 2.2 micron Keck data set rotated by 45 degrees, M 8 . Note that these images are set on a 
arbitrary regular grid that has the origin shifted relative to the centre of the smooth lens model 
by A x = 0.062 and A y = 0.048 arcseconds along the x and y axis respectively. When this shift 
is taken into account the position of the density correction is consistent with the position of the 
substructure in the analytic reconstruction. 
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Figure 4: The detection of a dark matter substructure as a positive pixelized density correction in 
the 1.6 micron HST data set, M H st- Note that these images are set on a arbitrary regular grid 
that has the origin shifted relative to the centre of the smooth lens model by A x = 0.001 and A y 
= 0.020 arcseconds along the x and y axis respectively. When this shift is taken into account the 
position of the density correction is consistent with the position of the substructure in the analytic 
reconstruction. 
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Figure 5: Posterior probability distributions for the substructure mass (left) and position relative 
to the lens centre as obtained from the Nested-Sampling evidence exploration for the 2.2 micron 
Keck data set. 



'8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 

log! m,jm s ] 

Figure 6: Posterior on the inferred mass of the substructure, given the expected de-projection, with 
the intervals enclosing 68 per cent and 95 per cent of the posterior volume indicated by the vertical 
dashed and dotted lines, respectively. 
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Figure 7: The joint probability P(a,f\ {n s ,m},p) contours and marginalized probabilities 
P (f I { n s> m };P) an d P (a | {n s ,m},p) for a uniform prior (solid lines) and for a Gaussian 
prior in a (dashed lines) as obtained by combining the detection presented in this paper and the 
detection in the lens system SDSS J0946+1006 (ref. 8). 
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